Real-time pose estimation system using inertial and feature measurements

ABSTRACT

A hybrid estimator system using visual and inertial sensors for real-time pose tracking on devices with limited processing power using at least one processor, a memory, a storage and communications through a protocol and one or more than one software module for a hybrid estimator, real-time algorithm selection to process different measurements, statistical learning for these characteristics to compute the expected device computing cost of any strategy for allocating measurements to algorithms, and algorithm selection based on the statistical learning module.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. §119(e) of U.S.Provisional Patent Application Ser. No. 61/824,309, filed on May 16,2013, the contents of which are incorporated herein by reference intheir entirety.

FIELD OF THE INVENTION

The present invention relates to real-time pose tracking, and morespecifically to a hybrid estimator system using visual and inertialsensors for real-time pose tracking on devices with limited processingpower.

BACKGROUND

Currently, there are no adequate solutions to the problem of trackingthe 3D-pose of small, resource-constrained systems in unknownenvironments. Specifically, estimating the motion of miniature devices,similar in size to a mobile phone, is difficult. In contrast to mediumand large-scale systems, (e.g. mobile robots, UAVs, autonomous cars),small devices have limited computational capabilities and battery life,factors which make the pose-estimation problem challenging. In theabsence of GPS, the types of sensors that can be used for poseestimation in small-scale systems are quite restricted.

A key challenge in designing an estimator for this task is the limitedavailability of processing power. Feature detection algorithms can trackhundreds of features in images of natural scenes, but processing allthis data in real time is challenging, particularly for a small,resource-constrained device.

Additional difficulties arise when designing an estimator for anylocalization task. The computational efficiency of different methodsdepends strongly on the nature of each particular dataset. For instance,one algorithm may outperform all others when the environment isfeature-rich and the vehicle is moving slowly, while a differentalgorithm may be the fastest in feature-sparse environments under fastmotion. This makes algorithm selection a difficult task, for which nogeneral, systematic methods exist to date.

A huge number of methods have been proposed for pose estimation,however, existing methods typically consist of a single approach forprocessing the feature measurements (e.g., EKF-based SLAM, orsliding-window iterative minimization). As a result, they often do notgeneralize well to different environments, and are difficult to adapt tothe varying availability of computational resources.

Therefore there is a need for a hybrid-estimator system using visual andinertial sensors for real-time pose tracking on devices with limitedprocessing power.

SUMMARY

The present invention solves the problems of the prior art by providinga hybrid estimator system using sensors for real-time pose tracking ofmoving platforms. The system has a device with a processor, memory,storage and communications through a protocol to execute one or morethan one software module. The software modules comprise code executableon the processor for executing the hybrid estimator that uses aplurality of algorithms for processing measurements and selecting theconstituent algorithms of the hybrid estimator used to process eachmeasurement.

The system uses a module with non-statutory instructions for learningthe statistical characteristics of the measurements and the algorithmselection is based on the learning module. The system gathersinformation to compute the expected cost of strategies for processingmeasurements by different algorithms to reduce the computationalrequirements. These computations are solved by the system in real time.The system solves an optimization problem with an objective functionrepresenting the expected computation time to identify the preferredstrategy.

The hybrid estimator can then estimate a moving platform's trajectoryusing inertial measurements and the observations of features by one ormore than one sensors or by using inertial measurements and theobservations of features by a visual sensor.

The system of claim 1, where the system comprises non-transitoryinstructions to process each of the available feature measurements,where the feature measurements can be processed by including adescription of the features as variables to be estimated in the hybridestimator. Feature measurements can also be processed by using obtainedobservations in order to derive constraints, directly used for updatingthe pose estimates of the moving object.

The system has non-transitory instructions to improve the accuracy ofthe pose estimates, reduce the computational requirements of theestimator or both improve the accuracy of the pose estimates and reducethe computational requirements of the estimator during system operationto reduce the processing requirements.

The hybrid estimator has non-transitory instructions to determine whichof the plurality of methods is to be used for processing each of thefeature measurements to reduce the computational requirements of theestimator and to adapt the system to the characteristics of theenvironment, the trajectory of the moving object, and the availabilityof processing resources. The hybrid estimator determines the number offeatures that should be extracted from raw sensor data in order toadjust to the availability of computational resources.

The hybrid estimator constructs linearized approximations of thenonlinear mathematical models that describe the motion and the sensormeasurements in order to compute a description of the uncertainty of theestimates, where the linearization points are selected to preserve thesystem's observability properties. A unique estimate of certain statesis used in order to compute the linearization matrices for allmeasurement equations that involve each of the states. The estimates ofone or more than one state is used to compute the linearization matricesand can be modified by equal amounts, to reduce linearization inaccuracywhile preserving the system's observability properties.

The hybrid estimator can also implement a hybrid extended Kalman filter.The hybrid extended Kalman filter comprises an extended-Kalman filteralgorithm that includes feature states in the state vector, and asliding-window extended-Kalman-filter algorithm that includes states ofthe mobile platform in the state vector. The size of the sliding windowis selected to reduce the computational cost of the hybrid extendedKalman filter. The hybrid estimator module determines the choice ofalgorithm to process each individual feature depending on thedistribution of the feature track lengths of features. The hybridestimator also determines the optimal strategy for processing thefeature measurements by solving a one-variable optimization problemusing the information above and processes all available measurementswithout loss of localization information.

The plurality of algorithms, employ an extended Kalman filter, must havebounded computational complexity, irrespective of the duration of thetrajectory. The hybrid estimator also includes an extended Kalmanfilter-sliding-window iterative minimization algorithm, where the statevector contains a current IMU state as well as representations of thefeature positions. Features that leave the field of view are removedfrom the state vector leaving only the currently observed ones, to keepthe computations bounded. The hybrid estimator has otherextended-Kalman-filter algorithms that maintain a sliding window ofcamera poses in the state vector, and use the feature observations toapply probabilistic constraints between these poses. One of the otherextended Kalman filter is a multistate-constraint Kalman filter.

The algorithm used to process feature measurements is selected to havethe lowest computational cost. The hybrid filter is a combination ofboth the extended Kalman filter-sliding-window iterative minimizationand the multistate-constraint Kalman filter algorithms. The hybridfilter is a filter whose state vector contains the current IMU state, mcamera poses, and sk features, and determines whether a feature will beprocessed using the multistate-constraint Kalman filter algorithm, orwhether it will be included in the state vector and processed using theextended Kalman filter-sliding-window iterative minimization algorithm.

Also provided is method for using a hybrid estimator with visual andinertial sensors for real-time pose tracking on devices with limitedprocessing power. First, a method to be used by the system forprocessing each of the feature measurements is determined. Then, thesystem is adapted to the characteristics of the environment, thetrajectory of the moving object, and the availability of processingresources. Next, the number of features to be extracted from raw sensordata is determined in order to adjust to the availability ofcomputational resources. Finally, linearized approximations of thenonlinear mathematical models that describe the motion and the sensormeasurements are constructed.

Additionally, a method for making a hybrid estimator using visual andinertial sensors for real-time pose tracking on devices with limitedprocessing power. First, a device comprising at least one processor, amemory, a storage and communications through a protocol is provided.Next, one or more than one software module is provided. The software oneor more than one module are communicatively coupled to each other. Thesoftware modules provide a hybrid estimator, real-time algorithmselection to process different measurements, statistical learning forthese characteristics to compute the expected device computing cost ofany strategy for allocating measurements to algorithms, and algorithmselection based on the statistical learning module.

The method above also propagates the state vector and covariance matrixusing the IMU readings, determines when camera measurements and featuresare available, augments the state vector with the latest camera pose,determines if the features are to be processed using anmultistate-constraint Kalman filter algorithm, computing residuals andmeasurement Jacobian matrices, and form the residual and Jacobian matrixHk for features that are included in the state vector, update the statevector and covariance matrix using the residual and Jacobian matrix Hk,and initialize features tracked in all images of the sliding window,updating state management and removing the oldest camera poses from thestate vector.

If the features are to be processed using an multistate-constraintKalman filter algorithm, then the method calculates a residual andJacobian matrix for each feature to be processed, performs a Mahalanobisgating test, and forms a residual vector and a Jacobian matrix using allfeatures that passed the gating test.

State management is updated by removing sliding-window iterativeminimization features that are no longer tracked and changing the anchorpose for sliding-window iterative minimization features anchored at theoldest pose.

In addition, the method can also analyze the computations needed for thehybrid estimator and calculate the number of floating-point operationsper update of the hybrid algorithm.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features, aspects and advantages of the presentinvention will become better understood with regard to the followingdescription, appended claims, and accompanying figures where:

FIG. 1 is a diagram showing a distribution of feature track lengths in adata set;

FIG. 2 is a diagram showing a comparison of actual runtime versusfloating point operations of the system;

FIGS. 3 a and 3 b are diagrams showing the results of a Monte-Carlosimulation comparing timing performance on a laptop computer and rootmean square position accuracy of a hybrid filter, for changing values ofm on a laptop computer;

FIGS. 4 a and 4 b are diagrams showing of a Monte-Carlo simulationcomparing timing performance and root mean square position accuracy ofthe hybrid filter, for changing values of m on a mobile phone using adifferent data set than used in the simulation of FIG. 3;

FIG. 5 is a map showing the trajectory estimates of MSCKF, EKF-SLAM, andthe hybrid algorithm in the real-world example; and

FIG. 6 is a chart showing thresholds selected using the optimizationprocess for the real-world data set of FIG. 5;

FIG. 7 is a flowchart diagram of some steps of a method for using ahybrid estimator using visual and inertial sensors for real-time posetracking on devices with limited processing power, according to oneembodiment; and

FIG. 8 is a flowchart diagram of some steps of a method for making ahybrid estimator using visual and inertial sensors for real-time posetracking on devices with limited processing power, according to oneembodiment.

DETAILED DESCRIPTION

The present invention solves the limitations described above byproviding a new paradigm for the design of motion estimation algorithms.Specifically, the system uses a hybrid estimator that incorporates a oneor more than one of algorithms with different computationalcharacteristics. The system determines, in real time, the algorithm bywhich to process different measurements (e.g., different features).Because the preferred choice for each measurement will depend on thecharacteristics of sensor data, the system can employ statisticallearning of these characteristics. In one embodiment, the system gathersstatistical information to compute the expected cost of any strategy forallocating measurements to algorithms. To identify the preferredstrategy, the system solves, in real time, an optimization problem withan objective function of the expected computation time.

The system uses visual and inertial measurements, because cameras andinertial measurement units (IMUs) are small, lightweight, andinexpensive sensors that can operate in almost any environment. Thesystem estimates a moving platform's trajectory using the inertialmeasurements and the observations of naturally occurring point features.The system does not assume that a map of the area is available.

The present invention overcomes limitations of the prior art byproviding a hybrid estimator system using visual and inertial forreal-time pose tracking on devices with limited processing power. Thesystem uses a novel procedure and method for estimating the state (e.g.,the position, orientation, velocity) of a moving object (e.g., avehicle, handheld device, aircraft) by combining data from an inertialsensor and one or more than on additional sensors that provide featuremeasurements. One key advantage of the method is the use of one or morethan one different approaches for the processing of the featuremeasurements. The method used to process each of the available featuremeasurements is determined during system operation to improve theaccuracy of the pose estimates and/or reduce the computationalrequirements of the estimator. In one embodiment, the method is executedfor pose estimation using measurements from an inertial sensor and froma camera that observes “features” (e.g., points of interest) in thescene. The feature measurements can be processed by (i) explicitlyincluding them as variables to be estimated in the estimator, or (ii)using their observations in order to derive constraints, directly usedfor updating the pose estimates of the moving object.

There is also provided a method to determine which of the methods shouldbe used by the system for processing each of the feature measurements.This is used to reduce the computational requirements of the estimatorand to adapt the system to the characteristics of the environment, thetrajectory of the moving object, and the availability of processingresources.

Additionally, in one embodiment the system uses an adaptive approach todetermining the number of features that should be extracted from rawsensor data in order to adjust to the availability of computationalresources.

Finally, in order to compute an accurate description of the uncertaintyof the estimates, the method employs suitably constructed linearizedapproximations of the nonlinear mathematical models that describe themotion and the sensor measurements. By choosing linearization pointsthat preserve the system's observability properties, the system obtainshigh-precision, consistent estimates for the state of the moving object,at low computational cost.

Additionally, two other research papers by the inventors “Improving theAccuracy of EKF-Based Visual-Inertial Odometry” and “Vision-aidedInertial Navigation for Resource-constrained Systems” by Mingyang Li andAnastasios Mourikis that are hereby incorporated by reference in theirentirety.

All dimensions specified in this disclosure are by way of example onlyand are not intended to be limiting. Further, the proportions shown inthe Figures are not necessarily to scale. As will be understood by thosewith skill in the art with reference to this disclosure, the actualdimensions and proportions of any system, any device or part of a systemor device disclosed in this disclosure will be determined by itsintended use.

Systems and methods that implement the embodiments of the variousfeatures of the invention will now be described with reference to thedrawings. The drawings and the associated descriptions are provided toillustrate embodiments of the invention and not to limit the scope ofthe invention. Reference in the specification to “one embodiment” or “anembodiment” is intended to indicate that a particular feature,structure, or characteristic described in connection with the embodimentis included in at least an embodiment of the invention. The appearancesof the phrase “in one embodiment” or “an embodiment” in various placesin the specification are not necessarily all referring to the sameembodiment.

Throughout the drawings, reference numbers are re-used to indicatecorrespondence between referenced elements. In addition, the first digitof each reference number indicates the figure where the element firstappears.

As used in this disclosure, except where the context requires otherwise,the term “comprise” and variations of the term, such as “comprising”,“comprises” and “comprised” are not intended to exclude other additives,components, integers or steps.

In the following description, specific details are given to provide athorough understanding of the embodiments. However, it will beunderstood by one of ordinary skill in the art that the embodiments maybe practiced without these specific details. Well-known circuits,structures and techniques may not be shown in detail in order not toobscure the embodiments. For example, circuits may be shown in blockdiagrams in order not to obscure the embodiments in unnecessary detail.

Also, it is noted that the embodiments may be described as a processthat is depicted as a flowchart, a flow diagram, a structure diagram, ora block diagram. Although a flowchart may describe the operations as asequential process, many of the operations can be performed in parallelor concurrently. In addition, the order of the operations may berearranged. A process is terminated when its operations are completed. Aprocess may correspond to a method, a function, a procedure, asubroutine, a subprogram, etc. When a process corresponds to a function,its termination corresponds to a return of the function to the callingfunction or the main function.

Moreover, a storage may represent one or more devices for storing data,including read-only memory (ROM), random access memory (RAM), magneticdisk storage mediums, optical storage mediums, flash memory devicesand/or other machine readable mediums for storing information. The term“machine readable medium” includes, but is not limited to portable orfixed storage devices, optical storage devices, wireless channels andvarious other mediums capable of storing, containing or carryinginstruction(s) and/or data.

Furthermore, embodiments can be implemented by hardware, software,firmware, middleware, microcode, or a combination thereof. Whenimplemented in software, firmware, middleware or microcode, the programcode or code segments to perform the necessary tasks can be stored in amachine-readable medium such as a storage medium or other storage(s).One or more than one processor can perform the necessary tasks inseries, concurrently, distributed or in parallel. A code segment canrepresent a procedure, a function, a subprogram, a program, a routine, asubroutine, a module, a software package, a class, or a combination ofinstructions, data structures, or program statements. A code segment canbe coupled to another code segment or a hardware circuit by passingand/or receiving information, data, arguments, parameters, or memorycontents. Information, arguments, parameters, data, etc. can be passed,forwarded, or transmitted through a suitable means including memorysharing, message passing, token passing, network transmission, wirelesstransmission, etc.

The software identified above can also be constructed as a module. Themodule is a logically self-contained and discrete part of a largercomputer program, for example, a subroutine or a co-routine. Modules aretypically incorporated into the program through interfaces. The moduleinterface expresses the elements that are provided and required by themodule. The elements defined in the interface are detectable by othermodules. The implementation contains the executable code thatcorresponds to the elements declared in the interface. Modules performlogically discrete functions. The module can interact with other modulesof the system to achieve its purpose.

In the following description, certain terminology is used to describecertain features of one or more embodiments of the invention.

The term “visual-inertial odometry” refers to using inertialmeasurements from an IMU and the observations of naturally occurringpoint features from a visual sensor (e.g., a camera) to estimate thepose of a moving object in an environment without the use of a map.

Referring now to FIG. 1, there is shown a diagram 100 showingdistributions of feature track lengths in a data set. The system usesvisual-inertial odometry to solve problems in the prior art.Specifically, the system uses a hybrid extended Kalman filter (EKF),that integrates an extended Kalman filter for simultaneous localizationand mapping (EKF-SLAM) with a sliding-window EKF estimator. As explainedherein, these two estimators process the measurement information indifferent ways, and have different computational characteristics.

In one embodiment, the system determines the choice of algorithm toprocess each individual feature depending on the distribution of thefeature track lengths of all features. The distribution is not known inadvance, because it depends on the environment, the camera motion, aswell as the feature tracker used, and therefore the system learns itfrom an image sequence. Using this information, the optimal strategy forprocessing the feature measurements can be computed by solving aone-variable optimization problem. Results demonstrate that the hybridalgorithm outperforms each individual method (EKF-SLAM andsliding-window EKF) by a wide margin. In fact, the hybrid filter allowsprocessing the sensor data at real-time speed on a mobile phone,something that is not possible with either of the individual algorithms.Using the present invention, all the available measurements areprocessed, and no localization information is lost.

Since real-time performance is necessary, any candidate algorithm forthe system must have bounded computational complexity, irrespective ofthe duration of the trajectory. Within this class, practically allalgorithms proposed to date employ either an EKF, or iterativeminimization over a window of states. The latter approaches iterativelyrelinearize the measurement equations to better deal with theirnonlinearity, which, however, incurs a high computational cost. Bychoosing linearization points that preserve the system's observabilityproperties, an EKF-based system can attain better accuracy thaniterative-minimization methods, at only a fraction of the computationalcost. Specifically, the system can use a unique estimate of certainstates, such as, for example, the first available estimate, in order tocompute the linearization matrices (i.e., the Jacobians), for allmeasurement equations that involve each of the states. Optionally, theestimates of one or more than one state can be used to compute thelinearization matrices (i.e., the Jacobians) can be modified by equalamounts, to reduce linearization inaccuracy while preserving thesystem's observability properties.

Within the class of EKF methods, there are two possible formulations ofthe estimator. Specifically, the system can employ the EKF-SLAMapproach, where the state vector contains a current IMU state as well asfeature positions. To keep the computations bounded, features that leavethe field of view can be removed from the state vector, leaving only thecurrently observed ones. Other EKF algorithms maintain a sliding windowof camera poses in the state vector, and use the feature observations toapply probabilistic constraints between these poses. In a preferredembodiment, the system uses a multistate-constraint Kalman filter(MSCKF) algorithm that uses the all the information provided by thefeature measurements.

Both the MSCKF and EKF-SLAM use exactly the same information, and onlydiffer in the way they organize the computations, and in linearization.If the measurement models were linear, both methods would yield exactlythe same result, equal to the maximum-a-posteriori (MAP) estimate of theIMU pose. With respect to computational cost, however, the two methodsdiffer substantially. For the EKF-SLAM algorithm the cost at eachtime-step is cubic in the number of features, since all features in thestate vector are observed. In contrast, the MSCKF has computational costthat scales linearly in the number of features, but cubically in thelength of the feature tracks. Therefore, if many features are tracked,each in a small number of frames, the MSCKF approach is preferable, butif few features are tracked over long image sequences, EKF-SLAM wouldresult in lower computational cost. The system can integrate bothEKF-SLAM and the MSCKF algorithms in a single, hybrid filter due totheir complementary nature.

The system uses a modified MSCKF algorithm that attains increasedaccuracy and consistency by choosing linearization points that ensurethe correct observability properties. The MSCKF does not include thefeature positions in the state vector, but instead uses the featuremeasurements to directly impose constraints between the camera poses.The state vector of the MSCKF is:

x _(k) ^(T) =[x _(I) _(k) ^(T) x _(C) ₁ ^(T) x _(C) ₂ ^(T) . . . x _(C)_(m) ^(T)]^(T)  (Eq. 1)

where x_(Ik) is the current IMU state, and x_(Ci), i=1 . . . m are thecamera poses (positions and orientations) at the times the last m imageswere recorded. The IMU state is defined as:

x _(I) =[ q ^(T) p _(T) v ^(T) b _(g) ^(T) b _(n) ^(T)]^(T)  (Eq. 2)

where q is the unit quaternion describing the rotation from the globalframe to the IMU frame, p and v denote the IMU position and velocity,respectively, while b_(g) and b_(a) are the gyroscope and accelerometerbiases.

The MSCKF uses the IMU measurements to propagate the current IMU stateand the covariance matrix, P_(k+1|k), while the feature measurements areused for EKF updates. Let's assume that the i-th feature, that has beenobserved in l images, has just been lost from tracking (e.g., it wentout of the field of view). At this time, the MSCKF method uses all themeasurements of the feature to carry out an EKF update. Let theobservations of the feature be described by the nonlinear equationsz_(ij)=h(x_(Cj), f_(i))+n_(ij), for j=1 . . . l, where f_(i) is thefeature position (described by the inverse-depth parameterization) andn_(ij) is the noise vector, modeled as zero-mean Gaussian withcovariance matrix σ²I₂. Using all the measurements we compute a featureposition estimate {circumflex over (f)}_(i), and then compute theresiduals {tilde over (z)}_(ij)=z_(ij)−h({circumflex over (x)}_(C) _(j), {circumflex over (f)}_(i)), j=1 . . . l. By linearizing, theseresiduals can be written as:

{tilde over (z)} _(ij) ≅H _(ij) {tilde over (x)} _(C) _(i) +H _(f) _(ij){tilde over (f)} _(i) +n _(ij) , j=1 . . . l  (Eq. 3)

where {tilde over (z)}_(l) and {tilde over (f)}_(i) are the estimationerrors of the j-th camera pose and i-th feature respectively, and thematrices H_(ij) and H_(f) _(ij) are the corresponding Jacobians. Sincethe feature is not included in the MSCKF state vector, we proceed tomarginalize it out. For this purpose, the system first forms the vectorcontaining the l residuals from all the feature's measurements:

{tilde over (z)} _(i) ≅H _(i) {tilde over (x)}+H _(f) _(i) {tilde over(f)} _(i) +n _(i)  (Eq. 4)

where {tilde over (z)}_(l) and n_(i) are block vectors with elements and{tilde over (z)}_(ij) and n_(ij), respectively, and H_(i) and H_(f) _(i)are matrices with block rows H_(ij) and H_(f) _(ij) , for j=1 . . . l.Subsequently, we define the residual vector {tilde over (z)}_(i)^(o)=V^(T) ⁻ _(z) _(i) , where V is a matrix whose columns form a basisof the left nullspace of H_(f) _(i) . From (Eq. 4), we obtain:

{tilde over (z)} _(i) ^(o) =V ^(T) _({tilde over (z)}) _(i) ≅V ^(T) H_(i) {tilde over (x)}+V ^(T) n _(i) =H _(i) ^(o) {tilde over (x)}+n _(i)^(o)  (Eq. 5)

Once {tilde over (z)}_(l) ^(o) and H_(i) ^(o) available, we proceed tocarry out a Mahalanobis gating test for the residual {tilde over(z)}_(i) ^(o), by computing:

γ=({tilde over (z)} _(i) ^(o))^(T)(H _(i) ^(o) P _(k+1|k)(H _(i)^(o))^(T)+σ² I)⁻¹ {tilde over (z)} _(i) ^(o)  (Eq. 6)

and comparing it against a threshold computed based on the χ²distribution with _(2l−3) degrees of freedom. By stacking together theresiduals of all features that pass this gating test, we obtain:

{tilde over (z)} ^(o) =H ^(o) {tilde over (x)}+n ^(o)  (Eq. 7)

where {tilde over (z)}^(o), H^(o), and n^(o) are block vectors/matrices,with block rows {tilde over (z)}_(l) ^(o), H_(l) ^(o), and n_(i) for i=1. . . n, respectively. We can now use the above residual, which onlyinvolves the camera poses, for an EKF update. However, if a large numberof features are processed at each time instant (the common situation),further computational savings can be achieved. Specifically, instead ofusing the above residual, we can equivalently compute the thin QRfactorization of H _(o) , written as H^(o)=QH′, and then employ theresidual

for updates, defined as:

{tilde over (z)} ^(r) =Q ^(T) {tilde over (z)} ^(o) =H ^(T) {tilde over(x)}+n ^(r)  (Eq. 8)

where n^(r) is the r×1 noise vector, with covariance matrix σ²I_(r).Once the residual

and the matrix H^(r) have been computed, we compute the state correctionand update the covariance matrix via the standard EKF equations:

Δx=K{tilde over (z)} ^(r)  (Eq. 9)

P _(k+1|k+1) =P _(k+1|k) −KSK ^(T)  (Eq. 10)

S=H ^(r) P _(k+1|k)(H ^(T))^(T)+σ² I _(r)  (Eq. 11)

K=P _(k+1|k)(H ^(r))^(T) S ⁻¹  (Eq. 12)

The above-described method of processing feature measurements isoptimal, in the sense that no approximations are used, except for theEKF's linearization. This is true, however, only if the sliding windowof states, m, contains at least as many states as the longest featuretrack. If it does not, then the measurements that occurred more than mtimesteps in the past cannot be processed. Therefore, to use all theavailable feature information, the MSCKF must maintain a window ofstates long enough to include the longest feature tracks.

The computation time of the MSCKF is dominated by the followingoperations:

1) The Mahalanobis test for each feature, requiring

$O\left( {\sum\limits_{i = 1}^{n}\; _{i}^{3}} \right)$

operations.

2) The computation of the residual and the Jacobian matrix in (Eq. 8),which, by exploiting the structure in H^(o), can be implemented inapproximately

$O\left( {\sum\limits_{i = 1}^{n}\; _{i}^{3}} \right)$

operations.

3) The computation of the Kalman gain and the update of the covariancematrix, which require O(r³/6+r(15+6m)²) operations. Here 15+6m is thesize of the state covariance matrix, and r is the number of rows ofH^(r). It can be shown that, in general, r (which equals the number ofindependent constraints for the camera poses) is given by:

r=2(l ₍₁₎ +l ₍₂₎ +l ₍₃₎−7  (Eq. 13)

where l₍₁₎, l₍₂₎, l₍₃₎ are the three longest feature tracks. Althoughthe computational cost of the MSCKF is linear in the number of features,it is at least quadratic in the size of the sliding window, m. In fact,if the size of the sliding window is chosen to be equal to the longestfeature tracks, then r is also O(m), and the overall complexity becomescubic in m. This demonstrates a shortcoming of the MSCKF for preservingall measurement information: the complexity of the algorithm scalescubically as a function of the longest feature track length.

In real-world datasets the distribution of the feature-track lengths isnon-uniform, with many features tracked for short durations, and veryfew stable features tracked over longer periods. For example, FIG. 1shows the distribution of the feature track lengths in two parts of areal-world dataset. As can be seen, even though the longest featuretracks reach 40 images, the vast majority of features are tracked for asmall number of frames. The percentage of features tracked in five orless images is 88% and 69% in the two cases shown, respectively. To beable to process the small percentage of features with long tracklengths, the MSCKF must maintain a long sliding window. This, however,is computationally inefficient. Therefore, the system integrates theMSCKF algorithm with EKF-SLAM, to address this limitation.

Alternatively, the system can process feature measurements using the EKFby including the feature measurements in a state vector, and using theobservations as in the standard visual-SLAM formulation. As discussedearlier, this approach has computational properties complementary tothose of the MSCKF. While the MSCKF is better at processing manyfeatures tracked for a few frames, EKF-SLAM is faster when few featuresare tracked for many frames. The system combines both approaches in asingle, “hybrid” estimator. Specifically, the system uses a filter whosestate vector at time-step k contains the current IMU state, a slidingwindow of m camera poses, and s_(k) features:

x _(k) =[x _(l) _(k) ^(T) x _(C) ₁ ^(T) . . . x _(C) _(m) ^(T) f ₁ ^(T). . . f _(s) _(k) ^(T)]^(T)  (Eq. 14)

This provides the system with the ability to determine whether a featurewill be processed using the MSCKF approach, or whether it will beincluded in the state vector and processed as in EKF-SLAM. By analyzingin detail the computational requirements of each EKF update, it can beshown that when many features are present there is no gain toinitializing the state vector if any feature is observed fewer than mtimes. Thus, in one embodiment of the system the strategy for using thefeatures is as follows: if feature i's track is lost after fewer than mframes (i.e., l_(i)<m), then the feature is processed using the MSCKFequations, as described. If a feature is still actively being trackedafter m images, it is initialized into the state vector, and used forthe described SLAM computation. Therefore, the system can determine thesize of the sliding window, m.

At a given time step, the EKF update is carried out using a residualvector constructed by stacking together the residual computed from anumber of “MSCKF features” and from a number s_(k) of observations of“SLAM features”:

$\begin{matrix}{{\overset{\sim}{z}}_{k} = {{\begin{bmatrix}{\overset{\sim}{z}}_{k}^{r} \\{\overset{\sim}{z}}_{1m} \\\vdots \\{\overset{\sim}{z}}_{skm}\end{bmatrix} \simeq {{\begin{bmatrix}H^{r} \\H_{1m} \\\vdots \\H_{skm}\end{bmatrix}{\overset{\sim}{x}}_{k}} + n_{k}}} = {{H_{k}{\overset{\sim}{x}}_{k}} + n_{k}}}} & \left( {{Eq}.\mspace{14mu} 15} \right)\end{matrix}$

In the above equation, {tilde over (z)}_(jm), for j=1 . . . s_(k) arethe residuals of the observations of the SLAM features from the latestcamera state (state m), and H_(jm), for j=1 . . . s_(k) are theassociated Jacobian matrices. Each of these residuals is a 2×1 vector,while each H_(jm) is a 2×(15+6m+3 s_(k)) matrix (here 15+6m+3s_(k) isthe size of the state covariance matrix). The residual {tilde over(z)}_(k) and the Jacobian H_(k) are used for updating the state andcovariance matrix.

For initializing new features, the m measurements of a feature are usedto triangulate it and to compute its initial covariance and thecross-correlation with other filter states. In our implementation, weuse the inverse-depth feature parameterization, due to its superiorlinearity properties. The latest camera clone, x_(C) _(m) , is used asthe “anchor” state, based on which the inverse-depth parameters arecomputed. If the feature is still actively being tracked at the time itsanchor state needs to be removed from the sliding window, the anchor ischanged to the most recent camera state, and the covariance matrix ofthe filter is appropriately modified. The hybrid MSCKF/SLAM algorithm isdescribed in Algorithm 1.

Algorithm 1 Hybrid MSCKF/SLAM Algorithm

Propagation: Propagate the state vector and covariance matrix using theIMU readings. Once camera measurements become available:

Augment the state vector with the latest camera pose.

For features to be processed in the MSCKF (feature tracks of lengthsmaller than m), do the following:

-   -   For each feature to be processed, calculate the residual and        Jacobian matrix in (Eq. 5).    -   Perform the Mahalanobis gating test in (Eq. 6).    -   Using all features that passed the gating test, form the        residual vector and the Jacobian matrix in (Eq. 8).

For features that are included in the state vector, compute theresiduals and measurement Jacobian matrices, and form the residual{tilde over (z)}_(k) and matrix H_(k) in (Eq. 15).

Update the state vector and covariance matrix, via (Eq. 9)-(Eq. 12),using the residual {tilde over (z)}_(k) and Jacobian matrix H_(k).

Initialize features tracked in all m images of the sliding window. StateManagement:

Remove SLAM features that are no longer tracked, and change the anchorpose for SLAM features anchored at the oldest pose.

Remove the oldest camera pose from the state vector. If no feature iscurrently tracked for more than m_(o) poses (with m_(o)<m−1), remove theoldest m−m_(o) poses.

The system can select the size of the sliding window, m, to reduce thecomputational cost of the hybrid MSCKF/SLAM filter. The choice of m canhave a profound effect on the time requirements of the algorithm. With asuitable choice, the hybrid method can significantly outperform each ofits individual components.

By carefully analyzing the computations needed, the system can calculatethe number of floating-point operations per update of the hybridalgorithm:

$\begin{matrix}\begin{matrix}{f_{k} = {{\alpha_{1}{\sum\limits_{i = 1}^{n}\; _{i}^{3}}} + {\alpha_{2}\left( {r + {2s_{k}}} \right)}^{3} +}} \\{{{{\alpha_{3}\left( {r + {2s_{k}}} \right)}\left( {15 + {6m} + {3s_{k}}} \right)^{2}} + {l.o.t}}}\end{matrix} & \left( {{Eq}.\mspace{14mu} 16} \right)\end{matrix}$

where the α_(i)'s are known constants, n is the number of features usedin the MSCKF, r is defined in (13), and l.o.t. stands for lower-orderterms. The full expression for the operation count consists of tens ofindividual terms, whose inclusion would merely complicate thedescription and not add much insight. The three terms shown abovecorrespond to the computation of the MSCKF residual, the Choleskyfactorization of the matrix S in (Eq. 12), and the covariance updateequation, respectively. Note that (Eq. 16) also models the probabilityof failure for the Mahalanobis test.

It is interesting to examine the properties of (Eq. 16). First, notethat since r represents the number of constraints provided by the MSCKFfeatures for the poses in the sliding window, it is bounded above by6m−7: the total number of unknowns in the sliding window is 6m, and thefeature measurements cannot provide any information about the globalpose or scale, which correspond to 7 degrees of freedom. If manyfeatures are available, the system will have r≈6m−7, and thus:

$\begin{matrix}\begin{matrix}{f_{k} \approx {{\alpha_{1}{\sum\limits_{i = 1}^{n}\; _{i}^{3}}} + {\alpha_{2}\left( {{6m} + {2s_{k}} - 7} \right)}^{3} +}} \\{{{\alpha_{3}\left( {{6m} + {2s_{k}} - 7} \right)}\left( {15 + {6m} + {3s_{k}}} \right)^{2}}}\end{matrix} & \left( {{Eq}.\mspace{14mu} 17} \right)\end{matrix}$

This approximate expression makes it possible to gain intuition aboutthe behavior of the computational cost of the hybrid method: as the sizeof the sliding window, m, increases, more features will be processed bythe MSCKF, and fewer features will be included in the state vector ofthe filter. Thus, as m increases, the term Σ_(i=1) ^(n)l_(i) ³ willincrease, but s_(k) will decrease rapidly. These two opposing trendsresult in the performance curves shown in FIGS. 3 a and 3 b.

In one embodiment, the system determines, in real time, the optimalvalue of m in order to minimize the runtime of the hybrid EKF estimator.Equation 16 provides the operation count of one filter update, so atfirst glance, it may appear that the system would need to minimize thisequation with respect to m, at each time instant. However, that would bean ill-posed problem. To see why, consider the case where, at time stepk, the sliding window has length m=20, and ten features exist that havebeen continuously tracked in 20 images. At this time, the system caneither increase the size of the sliding window, or include the tenfeatures in the state vector. Which is best depends on the futurebehavior of the feature tracks. If the features end up being tracked fora very large number of frames (>>20), then it would be preferable toinclude the features in the state vector. If, on the other hand, thefeatures end up being tracked for only 21 frames, it would be preferableto increase m by one.

Clearly, it is impossible to obtain future information about anyparticular feature track. The system can however, collect statisticalinformation about the properties of the feature tracks, and use thisinformation to minimize the expected cost of the EKF updates. Thisapproach is implemented in one embodiment of the system. Specifically,during the filter's operation, the system collects statistics to learnthe probability mass function (pmf) of the feature track lengths, theprobability of failure of the Mahalanobis gating test, as well as thepmf of the number of features tracked in the images. Using the learnedpmfs, the system computes the average number of operations needed foreach EKF update, f(m), by direct application of the definition of theexpected value of a function of random variables.

The value of m that yields the minimum f(m) can be found by exhaustivesearch among all possible values. However, the cost curve in manypractical cases is quasiconvex, which means that it has a unique minimum200. Therefore, to reduce the time needed for the optimization, thesystem can perform the optimization by local search starting from aknown good initial guess (e.g., the last computed value of m). Since thestatistical properties of the feature tracks can change over time (seeFIG. 1), the system can perform the learning of the pmfs as well as theselection of m, in consecutive time windows spanning a few seconds (15sec in one implementation).

It is worth noting that in modern computers the use of flop counts tomodel computational cost is not always suitable, as performance isaffected by several factors including vectorization, cache accesspatterns, data locality, etc. However, it has been experimentallyverified that in the algorithms considered here, and for ourimplementation, the computed flop counts closely follow the observedruntimes. Specifically, FIG. 2 shows the actual runtime of the hybridfilter, as well as the value f(m) calculated analytically, for aspecific trajectory. As can be observed, the two curves are verysimilar, with the only significant differences observed at the two“extremes” of very small or very large m. These regions are lessimportant, however, as they fall outside the typical operational regionof the hybrid filter. Thus, using f(m) as the objective function tominimize is appropriate.

To test the system simulation data was generated that closely matchedreal-world datasets, to have as realistic a test environment aspossible. A first simulation environment is based on a dataset thatconsists of a 29.6-km long trajectory through canyons, forested areas,and a town. A ground truth trajectory (position, velocity, orientation)was generated that matches the vehicle's actual trajectory, as computedby a high-precision NS system. Using this trajectory, IMU measurementscorrupted with noise and bias characteristics identical to those of theXsens MTi-G sensor used in the dataset were subsequently generated.Moreover, monocular feature tracks with statistical characteristicssimilar to those of the features extracted in the actual dataset werealso generated. The number of features observed in each image and thefeature track distribution change in each part of the trajectory, as inthe actual dataset (see FIG. 1). Overall, there are 232 featuresobserved in each image on average, and the average feature track lengthis 5.6 frames. The IMU measurements are available at 100 Hz, while thecamera frame rate is 20 Hz. All the results reported are averages over50 Monte-Carlo trials.

Referring now to FIGS. 3 a and 3 b, there are diagrams 300 showing theresults of a Monte-Carlo simulation comparing timing performance on alaptop computer and root mean square position accuracy of a hybridfilter, for changing values of m. Specifically, FIG. 3 a shows theaverage runtime for each update of the hybrid algorithm. The solid blueline is the average time when the threshold m is chosen in advance, andkept constant for the entire trajectory. The red dashed line denotes theruntime achieved when applying an optimization process that chooses thethreshold in each time window, to adapt to local feature properties.This plot shows that, by optimizing the value of m in real time, we canattain performance higher than that of any fixed threshold. Note thatwhen m is large, no features are initialized, and thus the right-mostpart of the plot gives the performance of the plain MSCKF (similarly,for small m we obtain pure EKF-SLAM). Therefore, from this plot we cansee that the optimal hybrid filter has a runtime 37.17% smaller thanthat of the MSCKF, and 72.8% smaller than EKF-SLAM.

In the second subplot FIG. 3 b there is a plot of the RMS positionerror, averaged over all Monte-Carlo trials and over the duration of thetrajectory. As can be seen, the plain MSCKF results in the highestaccuracy. This can be attributed to two causes. First, in the MSCKFfeatures are explicitly marginalized, and thus no Gaussianityassumptions are needed for the pdf of the feature position errors (as isthe case in SLAM). Second, all the measurements of each feature are usedjointly in the MSCKF, which means that outliers can be more easilydetected, and better linearization points can be computed. By combiningthe MSCKF with EKF-SLAM some accuracy can be lost, as the errors for thefeatures included in the state vector are now assumed to be Gaussian.However, as can be seen, if the size of the sliding window increasesabove a moderate value (e.g., 9 in this case), the change in theaccuracy is almost negligible. Intuitively, when a sufficient number ofobservations are used to initialize features, the feature errors' pdfbecomes “Gaussian enough” and the accuracy of the hybrid filter is veryclose to that of the MSCKF. Based on these results, the system does notallow the value of m to fall below a certain threshold. In a preferredembodiment, the value of m is set to 7.

The timing results presented in FIGS. 3 a and 3 b were obtained on alaptop computer with a Core i7 processor at 2.13 GHz, and asingle-threaded C++ implementation. Clearly, if the data were to beprocessed on this computer, the timing performance would easily allowfor real-time implementation (the hybrid filter requires fewer than 4msec per update with optimal m). However, our primary interest is inimplementing pose estimation on small portable devices. For this reason,we conducted a second set of tests, in which the data were processed ona Samsung Galaxy S2 mobile phone, equipped with a 1.2-GHz Exynos 4210processor. For these tests, we ported our C++ implementation to Androidusing the Android NDK. The simulation data were produced in a similarway as described above, by emulating a real-world dataset collectedwhile driving in a residential area of Riverside, Calif. The vehicletrajectory and statistics of the dataset (e.g., feature distribution,feature track length, and so on) are different, allowing us to test theproposed algorithm in different situations.

Referring now to FIGS. 4 a and 4 b, there are shown diagrams 400 showingof a Monte-Carlo simulation comparing timing performance and root meansquare position accuracy of the hybrid filter, for changing values of mon a mobile phone using a different data set than used in the simulationof FIG. 3. FIG. 4 a shows the results of the hybrid filter in thissetting. These results are very similar to what was observed in thefirst simulation, with the optimal hybrid filter outperforming each ofthe individual algorithms by a wide margin (runtime 45.8% smaller thanthe MSCKF, and 55.6% smaller than SLAM). More importantly, however, weobserve that the hybrid filter is capable of processing the data atreal-time speeds, even on the much less capable processor of the mobilephone. Specifically, the average time needed for each update of thehybrid filter with optimally chosen thresholds is 33.78 msec,corresponding to a rate of 30 images per second. Since the images arerecorded at 20 Hz, this means that the proposed hybrid estimator iscapable of real-time processing, something that is not possible with theany of the individual methods.

Referring now to FIG. 5, there is shown a map 500 showing the trajectoryestimates of MSCKF, EKF-SLAM, and the hybrid algorithm in the real-worldexample. In addition to the synthetic datasets described above, weemployed our proposed algorithm for processing the feature measurementsrecorded during a real-world experiment. During this experiment anIMU/camera platform was mounted on top of a car and driven on citystreets. The sensors consisted of an Inertial Science ISIS IMU and aPointGrey Bumblebee2 stereo pair (only a single camera's images areused). The IMU provides measurements at 100 Hz, while the camera imageswere stored at 10 Hz. Harris feature points are extracted, and matchingis carried out by normalized cross-correlation. The vehicle trajectoryis approximately 5.5 km long, and a total of 7922 images are processed.

In this dataset, to compensate for the low frame rate of the images, weuse lower feature-detection thresholds, which leads to a larger numberof features. Specifically, 540 features are tracked in each image onaverage, and the average track length is 5.06 frames. This substantiallyincreases the overall computational requirements of all algorithms. Whenrun on the mobile phone's processor, the average time per update is 139msec for the MSCKF, 774 msec for EKF-SLAM, and 77 msec for the hybridfilter with optimally selected thresholds. The trajectory estimates 500for each of the three methods are plotted on a map of the area where theexperiment took place. We observe that the accuracy of the MSKCF and thehybrid filter are similar, and substantially better than that of theEKF-SLAM.

Referring now to FIG. 6, there is shown a chart 600 showing thresholdsselected using the optimization process for the real-world data set ofFIG. 5. As can be seen in FIG. 6, there is a plot of the computed valuesfor the optimal m during the experiment. This figure shows that, due tothe changing properties of the feature tracks' distribution, the optimalvalue varies considerably over time, justifying the need for periodicre-optimization in this embodiment. As a final remark, it is noted thatthe optimization process itself is computationally inexpensive. In oneembodiment, the optimal threshold is re-computed every 15 sec, and thisprocess takes 0.31 msec, on average, on the mobile phone. Therefore, theoptimization takes up less than 0.003% of the overall processing time,while resulting in substantial performance gains.

Referring now to FIG. 7, there is shown a flowchart diagram 700 of somesteps of a method for using a hybrid estimator using visual and inertialsensors for real-time pose tracking on devices with limited processingpower, according to one embodiment. First, a determination as to what amethod is to be used by the system for processing each of the featuremeasurements is performed 702. Then, the system is adapted to thecharacteristics of the environment, the trajectory of the moving object,and the availability of processing resources 704. Next, a determinationas to the number of features to be extracted from raw sensor data ismade in order to adjust to the availability of computational resources706. Finally, linearized approximations of the nonlinear mathematicalmodels that describe the motion and the sensor measurements areconstructed 708.

Referring now to FIG. 8, there is shown a flowchart diagram 800 of somesteps of a method for making a hybrid estimator using visual andinertial sensors for real-time pose tracking on devices with limitedprocessing power, according to one embodiment. First, a devicecomprising at least one processor, a memory, a storage andcommunications through a protocol is provided. Then, one or more thanone module, communicatively coupled to each other, comprising codeexecutable on the processor is provided. Next, a hybrid estimator isinitialized 802. Then, a real-time algorithm to process differentmeasurements is selected 804. Next, statistical learning for thesecharacteristics is performed to compute the expected device computingcost of any strategy for allocating measurements to algorithms 806.Finally, an algorithm is selected based on the statistical learningmodule 808.

What has been described is a new hybrid estimator system using visualand inertial sensors for real-time pose tracking, overcoming limitationsand disadvantages inherent in the related art.

Although the present invention has been described with a degree ofparticularity, it is understood that the present disclosure has beenmade by way of example and that other versions are possible. As variouschanges could be made in the above description without departing fromthe scope of the invention, it is intended that all matter contained inthe above description or shown in the accompanying drawings shall beillustrative and not used in a limiting sense. The spirit and scope ofthe appended claims should not be limited to the description of thepreferred versions contained in this disclosure.

All features disclosed in the specification, including the claims,abstracts, and drawings, and all the steps in any method or processdisclosed, may be combined in any combination, except combinations whereat least some of such features and/or steps are mutually exclusive. Eachfeature disclosed in the specification, including the claims, abstract,and drawings, can be replaced by alternative features serving the same,equivalent or similar purpose, unless expressly stated otherwise. Thus,unless expressly stated otherwise, each feature disclosed is one exampleonly of a generic series of equivalent or similar features.

Any element in a claim that does not explicitly state “means” forperforming a specified function or “step” for performing a specifiedfunction should not be interpreted as a “means” or “step” clause asspecified in 35 U.S.C. §112.

What is claimed is:
 1. A hybrid estimator system using sensors forreal-time pose tracking of moving platforms, the system comprising: a) adevice comprising at least one processor, a memory, a storage andcommunications through a protocol; and b) one or more than one module,communicatively coupled to each other, comprising code executable on theprocessor for 1) a hybrid estimator, comprising a plurality ofalgorithms for processing measurements; and 2) selecting the constituentalgorithms of the hybrid estimator used to process each measurement. 2.The system of claim 1, where a module comprising non-statutoryinstructions for learning the statistical characteristics of themeasurements is used.
 3. The system of claim 1, where the constituentalgorithm selection is based on the learning module.
 4. The system ofclaim 1, where the system gathers information to compute the expectedcost of strategies for processing measurements by different algorithmsto reduce the computational requirements.
 5. The system of claim 1,where the system solves, in real time, an optimization problem with anobjective function representing the expected computation time toidentify the preferred strategy.
 6. The system of claim 1, where thehybrid estimator estimates a moving platform's trajectory using inertialmeasurements and the observations of features by one or more than onesensors.
 7. The system of claim 1, where the hybrid estimator estimatesa moving platform's trajectory using inertial measurements and theobservations of features by a visual sensor.
 8. The system of claim 1,where the system comprises non-transitory instructions to process eachof the available feature measurements.
 9. The system of claim 8, wherethe feature measurements can be processed by including a description ofthe features as variables to be estimated in the hybrid estimator. 10.The system of claim 8, where the feature measurements can be processedby using obtained observations in order to derive constraints, directlyused for updating the pose estimates of the moving object.
 11. Thesystem of claim 1, where the system comprises non-transitoryinstructions to improve the accuracy of the pose estimates, reduce thecomputational requirements of the estimator or both improve the accuracyof the pose estimates and reduce the computational requirements of theestimator during system operation to reduce the processing requirements.12. The system of claim 1, where the hybrid estimator comprisesnon-transitory instructions to determine which of the plurality ofmethods is to be used for processing each of the feature measurements toreduce the computational requirements of the estimator and to adapt thesystem to the characteristics of the environment, the trajectory of themoving object, and the availability of processing resources.
 13. Thesystem of claim 1, where the hybrid estimator further comprisesinstructions for a module that determines the number of features thatshould be extracted from raw sensor data in order to adjust to theavailability of computational resources.
 14. The system of claim 1,where the hybrid estimator comprises non-transitory instructions toconstruct linearized approximations of the nonlinear mathematical modelsthat describe the motion and the sensor measurements in order to computea description of the uncertainty of the estimates.
 15. The system ofclaim 14, where the linearization points are selected to preserve thesystem's observability properties.
 16. The system of claim 15, where aunique estimate of certain states is used in order to compute thelinearization matrices for all measurement equations that involve eachof the states.
 17. The system of claim 15 where the estimates of one ormore than one state is used to compute the linearization matrices andcan be modified by equal amounts, to reduce linearization inaccuracywhile preserving the system's observability properties.
 18. The systemof claim 1, where the hybrid estimator comprises non-transitoryinstructions to implement a hybrid extended Kalman filter.
 19. Thesystem of claim 1, where the hybrid extended Kalman filter comprises anextended-Kalman filter algorithm that includes feature states in thestate vector, and a sliding-window extended-Kalman-filter algorithm thatincludes states of the mobile platform in the state vector.
 20. Thesystem of claim 19, where the size of the sliding window is selected toreduce the computational cost of the hybrid extended Kalman filter. 21.The system of claim 1, where the hybrid estimator module determines thechoice of algorithm to process each individual feature depending on thedistribution of the feature track lengths of features.
 22. The system ofclaim 21, where the hybrid estimator module determines the optimalstrategy for processing the feature measurements by solving aone-variable optimization problem using the information above.
 23. Thesystem of claim 1, where the hybrid estimator module processes allavailable measurements without loss of localization information.
 24. Thesystem of claim 1, where the plurality of algorithms must have boundedcomputational complexity, irrespective of the duration of thetrajectory.
 25. The system of claim 24, where the plurality ofalgorithms employ an extended Kalman filter.
 26. The system of 1, wherethe hybrid estimator includes an extended Kalman filter-sliding-windowiterative minimization algorithm, where the state vector contains acurrent IMU state as well as representations of the feature positions.27. The system of claim 26, where the features that leave the field ofview are removed from the state vector leaving only the currentlyobserved ones, to keep the computations bounded.
 28. The system of claim26, where the hybrid estimator comprises other extended-Kalman-filteralgorithms that maintain a sliding window of camera poses in the statevector, and use the feature observations to apply probabilisticconstraints between these poses.
 29. The system of claim 26, where thehybrid estimator comprises a multistate-constraint Kalman filter. 30.The system of claim 26, where the algorithm used to process featuremeasurements is selected to have the lowest computational cost.
 31. Thesystem of claim 26, where the hybrid filter is a combination of both theextended Kalman filter-sliding-window iterative minimization and themultistate-constraint Kalman filter algorithms.
 32. The system of claim26, where the hybrid filter is a filter whose state vector contains thecurrent IMU state, m camera poses, and s_(k) features, and determineswhether a feature will be processed using the multistate-constraintKalman filter algorithm, or whether it will be included in the statevector and processed using the extended Kalman filter-sliding-windowiterative minimization algorithm.
 33. A method for using a hybridestimator using visual and inertial sensors for real-time pose trackingon devices with limited processing power, the method comprising thesteps of: a) determining a method to be used by the system forprocessing each of the feature measurements; b) adapting the system tothe characteristics of the environment, the trajectory of the movingobject, and the availability of processing resources; c) determining thenumber of features to be extracted from raw sensor data in order toadjust to the availability of computational resources; and d)constructing linearized approximations of the nonlinear mathematicalmodels that describe the motion and the sensor measurements.
 34. Amethod for making a hybrid estimator using visual and inertial sensorsfor real-time pose tracking on devices with limited processing power,the method comprising the steps of: a) providing a device comprising atleast one processor, a memory, a storage and communications through aprotocol; and b) providing one or more than one software module,communicatively coupled to each other, comprising code executable on theprocessor for: 1) a hybrid estimator; 2) real-time algorithm selectionto process different measurements; 3) statistical learning for thesecharacteristics to compute the expected device computing cost of anystrategy for allocating measurements to algorithms; and 4) algorithmselection based on the statistical learning module.
 35. The method ofclaim 34 further comprising the steps of: a) propagating the statevector and covariance matrix using the IMU readings; b) determining whencamera measurements and features are available; c) augmenting the statevector with the latest camera pose; d) determine if the features are tobe processed using an multistate-constraint Kalman filter algorithm;where if the features are to be processed the perform the followingsteps: 1) calculating a residual and Jacobian matrix for each feature tobe processed; 2) performing a Mahalanobis gating test; and 3) forming aresidual vector and a Jacobian matrix using all features that passed thegating test; e) computing residuals and measurement Jacobian matrices,and form the residual {tilde over (z)}_(k) and Jacobian matrix H_(k) forfeatures that are included in the state vector; f) updating the statevector and covariance matrix using the residual {tilde over (z)}_(k) andJacobian matrix H_(k); and g) initialize features tracked in all imagesof the sliding window; h) updating state management by performing thefollowing steps: 1) removing sliding-window iterative minimizationfeatures that are no longer tracked; and 2) changing the anchor pose forsliding-window iterative minimization features anchored at the oldestpose; and i) removing oldest camera poses from the state vector.
 36. Themethod of claim 35 further comprising the steps of: a) analyzing thecomputations needed for the hybrid estimator; and b) calculating thenumber of floating-point operations per update of the hybrid algorithm.